Search CORE

7 research outputs found

Towards High-Frequency Tracking and Fast Edge-Aware Optimization

Author: Bapat Akash
Publication venue
Publication date: 01/09/2023
Field of study

This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image-based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.Comment: PhD thesi

arXiv.org e-Print Archive

A Practical Stereo Depth System for Smart Glasses

Author: Alsisan Suhib
Bapat Akash
Blackburn-Matzen Kevin
Cohen Michael F.
Frahm Jan-Michael
He Zijian
Lehman Jonathan
Scharstein Daniel
Tsai Sam
Uyttendaele Matt
Vajda Peter
Wang Jialiang
Wang Yanghan
Yu Matthew
Publication venue
Publication date: 31/03/2023
Field of study

We present the design of a productionized end-to-end stereo depth sensing system that does pre-processing, online stereo rectification, and stereo depth estimation with a fallback to monocular depth estimation when rectification is unreliable. The output of our depth sensing system is then used in a novel view generation pipeline to create 3D computational photography effects using point-of-view images captured by smart glasses. All these steps are executed on-device on the stringent compute budget of a mobile phone, and because we expect the users can use a wide range of smartphones, our design needs to be general and cannot be dependent on a particular hardware or ML accelerator such as a smartphone GPU. Although each of these steps is well studied, a description of a practical system is still lacking. For such a system, all these steps need to work in tandem with one another and fallback gracefully on failures within the system or less than ideal input data. We show how we handle unforeseen changes to calibration, e.g., due to heat, robustly support depth estimation in the wild, and still abide by the memory and latency constraints required for a smooth user experience. We show that our trained models are fast, and run in less than 1s on a six-year-old Samsung Galaxy S8 phone's CPU. Our models generalize well to unseen data and achieve good results on Middlebury and in-the-wild images captured from the smart glasses.Comment: Accepted at CVPR202

arXiv.org e-Print Archive

Towards High-Frequency Tracking and Fast Edge-Aware Optimization

Author: Bapat Akash Abhijit
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2020
Field of study

Computer vision has seen tremendous success in refashioning cameras from mere recording equipment to devices which can measure, understand, and sense the surroundings. Efficient algorithms in computer vision have now become essential for processing the vast amounts of image data generated by a multitude of devices as well as enabling real-time applications like augmented reality/virtual reality (AR/VR). This dissertation advances the state of the art for AR/VR tracking systems by increasing the tracking frequency by orders of magnitude and proposes an efficient algorithm for the problem of edge-aware optimization. AR/VR is a natural way of interacting with computers, where the physical and digital worlds coexist. We are on the cusp of a radical change in how humans perform and interact with computing. This has been led by major technological advancements in hardware and in the tracking, rendering, and display algorithms necessary to enable AR/VR. Humans are sensitive to small misalignments between the real and the virtual world, and tracking at kilo-Hertz frequencies becomes essential. Current vision-based systems fall short, as their tracking frequency is implicitly limited by the frame-rate of the camera. This thesis presents a prototype system which can track at orders of magnitude higher than the state-of-the-art methods using multiple commodity cameras. The proposed system exploits characteristics of the camera traditionally considered as flaws, namely rolling shutter and radial distortion. The experimental evaluation shows the effectiveness of the method for various degrees of motion. Furthermore, edge-aware optimization is an indispensable tool in the computer vision arsenal for accurate filtering of depth-data and image based rendering, which is increasingly being used for content creation and geometry processing for AR/VR. As applications increasingly demand higher resolution and speed, there exists a need to develop methods that scale accordingly. This dissertation proposes such an edge-aware optimization framework which is efficient, accurate, and algorithmically scales well, all of which are much desirable traits not found jointly in the state of the art. The experiments show the effectiveness of the framework in a multitude of computer vision tasks such as computational photography and stereo.Doctor of Philosoph

Carolina Digital Repository

Bapat, Akash; Ravi, Adit and Raman, Shanmuganathan, "An iterative, non-local approach for restoring depth maps in RGB-D images", in the 21st National Conference on Communication (NCC-2015), IIT Bombay, IN, Feb. 27- Mar. 1, 2015.

Author: Bapat Akash
Raman Shanmuganathan
Ravi Adit
Publication venue: IIT Bombay
Publication date: 01/02/2015
Field of study

by Akash Bapat, Adit Ravi and Shanmuganathan Rama

IIT Gandhinagar